Word count: 2823
Airbnb was founded in 2008 to make money by charging guests and hosts for short-term rentals of private homes or flats booked through the Airbnb website. It started with a prototype in San Francisco and expanded rapidly, and in many local markets the arrival and expansion of Airbnb has sparked debate about what factors in the local market can influence its listings and prices
Part 1 combines New York AIrbnb listing data with New York City demographic data for analysis, focusing on the distribution of Airbnb listings and prices in New York, as well as a preliminary economic analysis of each region of New York in conjunction with some economic variables.
For this project,we need to use 2 datasets. Firstly, I will explore the data of Airbnb listings in NYC from http://insideairbnb.com/ This dataset includes information on 37,713 Airbnb. Then we will explore the data of demographic for New York City Neighborhood(https://geodacenter.github.io/data-and-lab/.) from American Community Survey,which includes 195 observations and 98 variable.
In order to ease the later analysis stage, the dataframe will be reduced to smaller subset to help us analyze thr pricing by area, so I started by eliminating a portion of the data that will not be used in the next analysis, e.g., number of reviews, names of household, etc. To get the best result, we also need to remove some of outliers. First, consider that the price of a listing is not only related to its location, but also to its room type, for example, a private room in the same location will often be more expensive than a share room. Since there are many factors that influence prices, we cannot just drop outliers for prices in the Airbnb dataset here. We obtained a general overview of the data based on the statistical summary, according to which there are some listings with zero prices, and as this is not possible in reality, regarding the outliers, we initially remove only the parts with zero prices.
Originally, the NYC neighborhood data shapefiles was in the Marcator (WGS84) projection. This is common when downloading data. However, this projection is in degrees and difficult to interperet.
So to simplify the next analysis, we also needed to change the CRS of the data, and we chose to base it on the New York CRS provided by QGIS(NAD83). As part of the project, We also needed to calculate the number of listings per Neighborhood Listing Area (NTA) and the average price per NTA. However, Airbnb’s listing data is in the csv format and there is no CRS associated, so we need to convert this into a shapefile format and define a projection that is the same as the NYC neighborhood data.Therefore, I need to change the CRS to a projection in meters so that it does not distort distances and areas.“ NAD83/Long Island, NY “ is the projection format that I chose for this analysis, as it had the least distortion in terms of area, direction, and distance for NYC.
Airbnb listings in NYC has all the basic information about each airbnb in NYC, includs host’s information, geometry information, room type, review and price etc. Demographic for New York City neighbourhhod from ACS(2008-2012) shows the information about population,education level, employment,poverty and Gini-coeffient in each neighborhood of NYC. It is worth noting that Airbnb’s listing information, on an individual house basis, and a portion of the ACS information on a per-neighborhood basis, such as the number of Asians per neighborhood. So in order to better combine the two datasets for economic analysis, we present the information of New York Airbnb listings at neighborhood level in the next analysis
Using the New York City neighbourhoods layer obtained from https://geodacenter.github.io/data-and-lab/.
This part will show Airbnb in New York City at the Neighbourhood level from 2 aspects: the number of listings and average price. Both maps will use the Jenks natural breaks classification method, one of the data clustering methods designed to determine the best arrangement of values in different classes. Best ranges imply the ranges where like areas are grouped. In this case, each neighbourhood has a different number of Airbnb and each Airbnb has different price, so for a beautiful and reasonable visualization, “Jenks” is our choice of classification method.
Number of Airbnb listings in New York City at Neighbourhood Level
First of all, we merge the ACS dataset and listing data set, then I obtain a count of listings by neighbourhood, and show the total number of listings per NTA with different colors for different number of listings range by using “tmap” and I select the pop up id to the name of neighborhood.
## tmap mode set to interactive viewing
Map 1.1: Number of listings per Neighbourhood Tabulation Areas (NTA)
Following the map 1.1 Brookly and Manhattan have most Airbnb listings, suggesting there is more demand for these boroughs especially since Brooklyn is the most populous borough and Manhattan is the center of NYC. Most of New York’s attractions are also located in this area.
or the map of average price per NTA, I used the previously merged data, and I classify it by NTA name and calculate the average price per NTA with different colors for different price range, using “tmap” and setting the pop up id to the information of NTA.
However, it is worth noting that when looking at average Airbnb prices per neighbourhood, it is important to consider that Airbnb prices are influenced by the type of room, and that an area with too many shared rooms will have a very different average price than a neighbouring area with the same number of listings. So in order to compare the average Airbnb prices for each NTA in New York City, we need to further discuss the average price of each room type in each neighbourhood (Due to the small number of rooms in the hotel room category, we have combined the hotel room and private room to form the private category in view of the similarity of the facilities).
## tmap mode set to interactive viewing